30 research outputs found

    Estrategias para el acceso a contenidos Web mediante habla

    Get PDF
    El objetivo de la tesis es diseñar y evaluar diferentes estrategias para el acceso a contenidos web empleando habla. El trabajo se ha centrado en la reutilización de los contenidos web existentes y en plantear una interacción hablada que resulte rápida y amigable. En la primera fase se ha analizado el problema de la conversión genérica de contenidos web para su acceso a través de un navegador vocal y se han propuesto dos alternativas: la conversión automática y la conversión semiautomática. En la segunda fase se ha planteado la utilización de un sistema de diálogo hablado para el acceso a contenidos web en dominios restringidos. La propuesta está basada en un modelo de información y en un modelo de interacción. En la tercera fase se han realizado diversos experimentos con un sistema de recuperación de información dirigida por habla, proponiendo varias mejoras que permiten incrementar el rendimiento del sistema.Departamento de Informátic

    Acoustic characterization and perceptual analysis of the relative importance of prosody in speech of people with Down syndrome

    Get PDF
    There are many studies that identify important deficits in the voice production of people with Down syndrome. These deficits affect not only the spectral domain, but also the intonation, accent, rhythm and speech rate. The main aim of this work is the identication of the acoustic features that characterize the speech of people with Down syndrome, taking into account the different frequency, energy, temporal and spectral domains. The comparison of the relative weight of these features for the characterization of Down syndrome people's speech is another aim of this study. The openSmile toolkit with the GeMAPS feature set was used to extract acoustic features from a speech corpus of utterances from typically developing individuals and individuals with Down syndrome. Then, the most discriminant features were identied using statistical tests. Moreover, three binary classiers were trained using these features. The best classication rate, using only spectral features, is 87.33%, and using frequency, energy and temporal features, it is 91.83%. Finally, a perception test has been performed using recordings created with a prosody transfer algorithm: the prosody of utterances from one group of speakers was transferred to utterances of another group. The results of this test show the importance of intonation and rhythm in the identication of a voice as non typical. As conclusion, the results obtained point to the training of prosody in order to improve the quality of the speech production of those with Down syndrome

    Using challenges to enhance a learning game for pronunciation training of English as a second language

    Get PDF
    Producción CientíficaLearning games have a remarkable potential for education. They provide an emergent form of social participation that deserves the assessment of their usefulness and efficiency in learning processes. This study describes a novel learning game for foreign pronunciation training in which players can challenge each other. Native Spanish speakers performed several pronunciation activities during a one-month competition using a mobile application, designed under a minimal pairs approach, to improve their pronunciation of English as a foreign language. This game took place in a competitive scenario in which students had to challenge other participants in order to get high scores and climb up a leaderboard. Results show intense practice supported by a significant number of activities and playing regularity, so the most active and motivated players in the competition achieved significant pronunciation improvement results. The integration of automatic speech recognition (ASR) and text-to-speech (TTS) technology allowed users to improve their pronunciation while being immersed in a highly motivational game.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grant TIN2014-59852-R)Junta de Castilla y Leon (grant VA050G18

    IberSPEECH 2020: XI Jornadas en Tecnología del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli

    ISCA Workshop on Speech and Language Technology in Education (SLATE)

    Get PDF
    This paper introduces the architecture and interface of a serious game intended for pronunciation training and assessment for Spanish students of English as second language. Users will confront a challenge consisting in the pronunciation of a minimal-pair word battery. Android ASR and TTS tools will prove useful in discerning three different pronunciation proficiency levels, ranging from basic to native. Results also provide evidence of the weaknesses and limitations of present-day technologies. These must be taken into account when defining game dynamics for pedagogical purposes.MEC-FEDER Grant TIN2014-59852-R y la Junta de Castilla y León Regional Grant VA145U1

    Analysis of atypical prosodic patterns in the speech of people with Down syndrome

    Get PDF
    Producción CientíficaThe speech of people with Down syndrome (DS) shows prosodic features which are distinct from those observed in the oral productions of typically developing (TD) speakers. Although a different prosodic realization does not necessarily imply wrong expression of prosodic functions, atypical expression may hinder communication skills. The focus of this work is to ascertain whether this can be the case in individuals with DS. To do so, we analyze the acoustic features that better characterize the utterances of speakers with DS when expressing prosodic functions related to emotion, turn-end and phrasal chunking, comparing them with those used by TD speakers. An oral corpus of speech utterances has been recorded using the PEPS-C prosodic competence evaluation tool. We use automatic classifiers to prove that the prosodic features that better predict prosodic functions in TD speakers are less informative in speakers with DS. Although atypical features are observed in speakers with DS when producing prosodic functions, the intended prosodic function can be identified by listeners and, in most cases, the features correctly discriminate the function with analytical methods. However, a greater difference between the minimal pairs presented in the PEPS-C test is found for TD speakers in comparison with DS speakers. The proposed methodological approach provides, on the one hand, an identification of the set of features that distinguish the prosodic productions of DS and TD speakers and, on the other, a set of target features for therapy with speakers with DS.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grant TIN2017-88858-C2-1-R)Junta de Castilla y León (grant VA050G18

    Speech Prosody

    Get PDF
    An automatic labeling system using Sp ToBI annotation conventions has been applied both to a non-native corpus of Japanese speakers using Spanish and to a reference corpus of Spanish speakers. A set of metrics based on conditional entropy is computed by using the output of an automatic labeler which happens to be highly correlated with the rates assigned by a team of subject evaluators. An analysis of the relative frequencies in the use of each of the Sp ToBI symbols permits to identify the recurrent mistakes in the productions of non-native speakers. It is discussed with the results that the majority of the observed prosodic deficits can be explained by the prosodic transference between the Japanese and Spanish systems as it had been previouly reported in the state of art.MEC-FEDER Grant TIN2014-59852-R y la Junta de Castilla y León Regional Grant VA145U1

    Automatic assessment of non-native prosody by measuring distances on prosodic label sequences

    Get PDF
    The aim of this paper is to investigate how automatic prosodic labeling systems contribute to the evaluation of non-native pronunciation. In particular, it examines the efficiency of a group of metrics to evaluate the prosodic competence of non-native speakers, based on the information provided by sequences of labels in the analysis of both native and non-native speech. A group of Sp ToBI labels were obtained by means of an automatic labeling system for the speech of native and non-native speakers who read the same texts. The metrics assessed the differences in the prosodic labels for both speech samples. The results showed the efficiency of the metrics to set apart both groups of speakers. Furthermore, they exhibited how nonnative speakers (American and Japanese speakers) improved their Spanish productions after doing a set of listening and repeating activities. Finally, this study also shows that the results provided by the metrics are correlated with the scores given by human evaluators on the productions of the different speaker

    Engaging adolescents with Down syndrome in an educational video game

    Get PDF
    Producción CientíficaThis article describes the design, implementation and evaluation of an educational video game that helps individuals with Down syndrome to improve their speech skills, specifically those related to prosody. Special attention has been paid to the design of the user interface, taking into account the cognitive, learning, and attentional limitations of people with Down syndrome. The learning content is conveyed by activities of production and perception of prosodic phenomena, aimed at increasing their communicative competence. These activities are introduced within the narrative of a video game so that the players do not conceive the tool as a mere succession of learning activities, but so that they learn and improve their speech while playing. The evaluation strategy that has been followed involves real users and combines different evaluation activities. Results show a high level of acceptance by participants and also by professionals, speech therapists, and special education teachers.2018-09-01MEC-FEDER Grant TIN2014-59852-R y la Junta de Castilla y León Regional Grant VA145U1

    Applying a fuzzy classifier to generate Sp ToBI annotation : preliminar results

    Get PDF
    One of the goals of the Glissando research project1 is to enrich a radio news corpus [1] with Sp ToBI labels. In this paper we present the application of the automatic predictions of a fuzzy classifier to speed up the labeling process. The strategy is proposed after completing the following steps: a) manual annotation of a part of the Glissando corpus with Sp ToBI labels and checking of the coherence of the labels; b) training of the automatic system; c) validation or correction of the automatic system's predictions by a human expert. The automatic judgments of the classifier are enriched with confidence measures that are useful to represent uncertain situations concerning the label to be assigned. The main aim of the paper is to show that there exists a correspondence between the uncertain situations that are identified during an inter-transcriber experiment and the uncertain situations that the fuzzy classifier detects. Labeling time reduction encourages the use of this strateg
    corecore